Probabilistic atlas for the language network based on precision fMRI data from >800 individuals

Two analytic traditions characterize fMRI language research. One relies on averaging activations across individuals. This approach has limitations: because of inter-individual variability in the locations of language areas, any given voxel/vertex in a common brain space is part of the language network in some individuals but in others, may belong to a distinct network. An alternative approach relies on identifying language areas in each individual using a functional ‘localizer’. Because of its greater sensitivity, functional resolution, and interpretability, functional localization is gaining popularity, but it is not always feasible, and cannot be applied retroactively to past studies. To bridge these disjoint approaches, we created a probabilistic functional atlas using fMRI data for an extensively validated language localizer in 806 individuals. This atlas enables estimating the probability that any given location in a common space belongs to the language network, and thus can help interpret group-level activation peaks and lesion locations, or select voxels/electrodes for analysis. More meaningful comparisons of findings across studies should increase robustness and replicability in language research.

Language atlas topography. Probabilistic functional atlas for the language > control contrast based on overlaid individual binarized activation maps (where in each map, the top 10% of voxels are selected, as described in the text). (a) SPM-analyzed volume data in the MNI template space (based on 806 individual maps). (b) FreeSurfer-analyzed surface data in the FSaverage template space (based on 804 individual maps). In both figures, the color scale reflects the proportion of participants for whom that voxel/vertex belongs to the top 10% of language > control voxels/vertices (threholded at p = 0.2 for visualization purposes).
as NeuroVault 52 and ENIGMA 53 . We emphasize that LanA is not a replacement for localizers: when possible, a language localizer task should be performed 54 . As we show in SI-1, the effect sizes obtained from group-level regions of interest (ROIs) based on LanA, or from commonly used Glasser parcels 55 are underestimated relative to individually defined language functional ROIs.
We also release (i) individual activation maps (in the MNI and FSaverage spaces), along with demographic data, and (ii) individual-level neural markers (based on the volumetric analyses), including effect sizes, voxel count (activation extent), and stability of activation across runs. The neural marker data can be used as normative distributions, based on neurotypical relatively young adults, against which any new population (e.g., children or individuals with developmental or acquired brain disorders) can be evaluated.

Participants.
A total of 806 neurotypical adults (477, ~59%, female), aged 19 to 75 (441, ~55%, aged 19-29; 310, ~38%, aged 30-39; 55, ~7%, aged 40+), participated for payment between September 2007 and June 2021, as summarized in Table 1. All participants had normal or corrected-to-normal vision, and no history of neurological, developmental, or language impairments. Handedness information was collected for 758 (~94%) of the 806 participants. Of those, 707 participants (~93%) were right-handed, as determined by the Edinburgh handedness inventory 56 or self-report, 38 (~5%) were left-handed, and 13 (~2%) -ambidextrous. (The participants for whom handedness is missing in the database are most likely right-handed because most of them were tested during the earlier years of data collection when right-handedness was one of the requirements for participation.) Of the 806 participants, 629 (~78%) were native speakers of English, and the remaining 177 (~22%) -native speakers of another language and proficient speakers of English (see 38 for evidence that the topography of language responses for a language that an individual is proficient in is similar to that of their native language, and see SI-2 for a comparison between the atlas generated using the 629 native English speakers vs. the 177 proficient non-native speakers). Given this demographic distribution, this atlas represents certain populations better than others, and these biases should be taken into account when the data are interpreted, including in comparison to other populations.
Each participant completed a language localizer task 10 as part of one of the studies in the Fedorenko lab. Each scanning session lasted between 1 and 2 hours and included a variety of additional tasks. All participants gave informed written consent in accordance with the requirements of the MIT's Committee on the Use of Humans as Experimental Subjects (COUHES).
Participant and session selection. The 806 scanning sessions above (one session per participant) were selected from a total of 1,065 sessions across 819 participants that were available in the Fedorenko Lab database as of June 2021. The goal was to include as many participants as possible and, for the 163 participants who performed a language localizer in multiple sessions, to select a single session with high-quality data. To assess data quality, we examined the stability of the activation topography for the language localizer contrast (see Language localizer paradigm) across runs. This analysis was performed on the data preprocessed and analyzed in the volume (i.e., SPM-based analyses; see SPM preprocessing and analysis pipeline). For 1,062 of the 1,065 sessions, we calculated voxel-wise spatial correlations in activation magnitudes the language > control contrast (see Language localizer paradigm) between the odd-numbered and even-numbered runs (the three remaining sessions consisted of a single run and were evaluated by visual inspection of the contrast maps). The correlation values were calculated within the language 'parcels'-masks that denote typical locations of language areas. These masks (available at 54 http://evlab.mit.edu/funcloc) were derived from a probabilistic language atlas based on 220 participants (a subset of the participants in the current set of 806) and have been used in much past work [57][58][59][60][61][62][63] . Six masks (three in the frontal cortex and three in the temporal and parietal cortex) were derived from the probabilistic atlas in the left hemisphere and mirror-projected onto the right hemisphere. For each session, the correlation values were averaged across the twelve parcels, leading to a single value per session. This spatial correlation measure quantifies the stability of the activation landscape and is an objective proxy for data quality; it is affected by factors like head motion or sleepiness, but does not require subjective visual inspection of contrast maps (see SI-4 for evidence that this measure works similarly when considering all voxels vs. only voxels with positive values for the contrast of interest, suggesting that the values are not driven by the difference between responsive and non-responsive voxels). Sessions where the spatial correlation value was negative (n = 23; ~2%) were excluded, leaving 1,042 sessions across 806 participants. For the 163 participants with more than one session, we selected the session with the highest spatial correlation value for inclusion in the atlas (see 11 for evidence of the stability of spatial correlation www.nature.com/scientificdata www.nature.com/scientificdata/ values across sessions: i.e., if a participant shows a high spatial correlation in one session, they are likely to show a high spatial correlation in another session; unpublished data replicates this result across a larger population and multiple functionally distinct networks). Following this data selection procedure, the Fisher-transformed spatial correlations of the participants' language > control contrast were r = 0.98 and r = 0.57 for the left and right hemispheres, respectively (see 11 for similar values on a subset (n = 150) of these participants).
Language localizer paradigm. Across the 806 participants, ten language localizer versions were used, as summarized in Table 2. In each version, a sentence comprehension condition was contrasted with a linguistically or acoustically degraded control condition. Visual (reading) and auditory (listening) contrasts have been previously established to engage the same fronto-temporal language network 10,38,64,65 . Activity in this network has further been shown to not depend on task or materials 10 and to show robust effects across typologically diverse languages 38 . Furthermore, this network can be recovered from naturalistic task-free (resting state) data based on patterns of BOLD signal fluctuations over time 33,37 , and corresponds nearly perfectly to the network based on the sentences > nonwords contrast 10 . As a result, we pooled data from across the different localizer versions in the current study (see SI-2 for evidence that an atlas defined on only Localizer Version A, used in the majority of participants, is nearly identical to an atlas that leverages data from all other versions, and SI-3 for a supplementary analysis showing robust language > control effects across all ten versions).
The vast majority of participants (624, ~77.4%) performed Localizer version A -a reading version, where sentences and nonword strings are presented one word/nonword at a time at the rate of 450 ms per word/nonword, in a blocked design (with 3 sentences/nonword strings in each 18 s block). Participants were instructed to read attentively and to press a button at the end of each trial, when a picture of a finger pressing a button appeared on the screen. The experiment consisted of two ~6-minute-long runs, for a total of 16 blocks for each of the two conditions. The presentation script and stimuli for this localizer version can be downloaded at 54 http://evlab.mit.edu/funcloc/ (for the stimuli used in the other localizer versions, contact EF). Localizer versions  www.nature.com/scientificdata www.nature.com/scientificdata/ B-G (performed by 169 participants, ~21.0%) also used visual presentation, and Localizer versions H-J (performed by 13 participants, ~1.6%) used auditory presentation. Details of the similarities and differences in trial structure, timing, and other experimental parameters across versions are summarized in Table 2.
fMRI data acquisition. Structural and functional data were collected on the whole-body, 3 Tesla, Siemens Trio scanner with a 12-channel (G1; n = 18) or 32-channel (G2; n = 788) head coil, at the Athinoula A. Martinos Imaging Center at the McGovern Institute for Brain Research at MIT. T1-weighted structural images were collected in 176 sagittal slices with 1 mm isotropic voxels (TR = 2,530 ms, TE = 3.48 ms). Functional, blood oxygenation level dependent (BOLD), data were acquired using an EPI sequence (with a 90 degree flip angle and using GRAPPA with an acceleration factor of 2), with the following acquisition parameters: 33 (G1) or 31 (G2) 4 mm thick near-axial slices acquired in the interleaved order (with 10% distance factor), 3.0 mm × 3.0 mm (G1) or 2.1 mm × 2.1 mm (G2) in-plane resolution, FoV in the phase encoding (A ≫ P) direction 192 mm (G1) or 200 mm (G2) and matrix size 64 mm × 64 mm (G1) or 96 mm × 96 mm (G2), TR = 2,000 ms and TE = 30 ms. Prospective acquisition correction 66 was used to adjust the positions of the gradients based on the participant's motion from the previous TR. The first 10 s of each run were excluded to allow for steady state magnetization.

SPM preprocessing and analysis pipeline.
Preprocessing. For the SPM analyses ( Fig. 2 [volume]), fMRI data were analyzed using SPM12 (release 7487), CONN EvLab module (release 19b), and custom MATLAB scripts. Each participant's functional and structural data were converted from DICOM to NIfTI format. All functional scans were coregistered and resampled using B-spline interpolation to the first scan of the first session (Friston et al. 67 ). Potential outlier scans were identified from the resulting subject-motion estimates as well as from BOLD signal indicators using default thresholds in CONN preprocessing pipeline (5 standard deviations above the mean in global BOLD signal change, or framewise displacement values above 0.9 mm; Nieto-Castañón 68 ). Functional and structural data were independently normalized into a common space (the MontrealNeurological Institute [MNI] template; IXI549Space) using SPM12 unified segmentation and normalization procedure (Ashburner & Friston 69 ) with a reference functional image computed as the mean functional data after realignment across all timepoints omitting outlier scans. The output data were resampled to a common bounding box between MNI-space coordinates (−90, −126, −72) and (90, 90, 108), using 2 mm isotropic voxels and 4th order spline interpolation for the functional data, and 1 mm isotropic voxels and trilinear interpolation for the structural data. Last, the functional data were smoothed spatially using spatial convolution with a 4 mm FWHM Gaussian kernel.
First-level analysis. Effects were estimated using a General Linear Model (GLM) in which each experimental condition was modeled with a boxcar function convolved with the canonical hemodynamic response function (HRF) (fixation was modeled implicitly, such that all timepoints that did not correspond to one of the conditions www.nature.com/scientificdata www.nature.com/scientificdata/ were assumed to correspond to a fixation period). Temporal autocorrelations in the BOLD signal timeseries were accounted for by a combination of high-pass filtering with a 128 seconds cutoff, and whitening using an AR(0.2) model (first-order autoregressive model linearized around the coefficient a = 0.2) to approximate the observed covariance of the functional data in the context of Restricted Maximum Likelihood estimation (ReML). In addition to experimental condition effects, the GLM design included first-order temporal derivatives for each condition (included to model variability in the HRF delays), as well as nuisance regressors to control for the effect of slow linear drifts, subject-motion parameters, and potential outlier scans on the BOLD signal.
FreeSurfer preprocessing and analysis pipeline. For the FreeSurfer analysis (Fig. 2 [surface]), fMRI data were analyzed using FreeSurfer v6.0.0. Each participant's functional and structural data were converted from DICOM to NIfTI format using the default unpacksdcmdir parameters. (Two of the 806 participants could not be included in this pipeline because their raw dicom files were lost, leaving 804 participants for this analysis.) The raw data were then sampled onto both hemispheres of the FSaverage surface, motion corrected and registered using the middle time point of each run. The data were then smoothed spatially with a 4 mm FWHM Gaussian filter. For the first-level analyses, effects were estimated using a GLM in which each condition was modeled with a first order polynomial regressor fitting the canonical HRF. The GLM also included nuisance regressors for offline-estimated subject-motion parameters.
Atlas creation. SPM. Using custom code (available at OSF 70 ), we computed the overlap of the individual activation maps for the language > control contrast using the 806 participants analyzed in the SPM12 pipeline (see SI-5 for evidence that the atlas reaches stability at sample sizes much smaller than 806, which suggests that the current sample size is sufficient to be generalizable). In particular, we used whole-brain t-maps that were generated by the first-level analysis and that contain a t-value for the relevant contrast in each voxel (a post-hoc analysis compared the whole-brain t-maps to their corresponding unscaled contrast maps and found strong voxel-wise correlations over the set of 806 participants: r = 0.93 ± 0.03; see SI-2 for evidence that atlases generated from t-maps vs. contrast maps are highly similar). In each individual map, we selected the 10% of voxels across the brain with the highest t-values for the language > control contrast (average and median minimum t-values across participants were 1.73 and 1.62, respectively; average and median maximum t-values were 13.8 and 13.7, respectively). These maps were then binarized so that the selected voxels got assigned a value of 1 and the remaining voxels-a value of 0. Finally, these values were averaged across participants in each voxel. The resulting atlas contains in each voxel a value between 0 and 1, which corresponds to the proportion of the 806 participants for whom that voxel falls in the 10% of voxels across the brain with the highest t-values for the language > control contrast. In the left hemisphere, these values range from 0 to 0.82, and in the RH-from 0 to 0.64 (the values are lower in the RH presumably because the majority of the selected voxels fall in the LH: across participants, average and median proportions of selected voxels falling in the LH are 58.3% and 57.8%, respectively). For more details on ROI-level probability values, see SI-6.
FreeSurfer. Using custom code (available at OSF 70 ), we computed the overlap of the individual activation maps for the language > control contrast, using the 804 participants analyzed in the FreeSurfer pipeline. The procedure was similar to that used for the SPM-based atlas, except that the selection of the highest t-values was performed on the surface vertices. To maintain hemispheric asymmetries, rather than evaluating each hemisphere separately, as is generally common for FreeSurfer analyses, the top 10% of vertices were selected from the vertices pooled across the LH and RH, as for the SPM-based atlas. For this atlas, in the left hemisphere, the proportion values range from 0 to 0.90, and in the RH-from 0 to 0.80 (these values are expectedly higher than those in the SPM-based atlas given the superiority of surface-based inter-individual alignment 71 ).
General. We chose the top 10% approach over an approach where each individual map is thresholded at a fixed t-value (as in 10 ) to account for inter-individual variability in the overall strength of BOLD signal responses due to trait or state factors [72][73][74][75] . However, because differences in the language network size may correspond to differences in linguistic experience or ability 76 , we additionally provide versions of the atlas that are derived from t-maps that are thresholded at p < 0.001, p < 0.01, or p < 0.05 (https://doi.org/10.17605/OSF.IO/KZWBH 70 ). Atlases based on the fixed t-value thresholding approach yield topographies that are very similar to the one based on the top 10% approach (see SI-2). The critical difference between these versions of the atlas is in the interpretation of the overlap values: whereas, as noted above, in the top 10% approach, the overlap values correspond to the proportion of the 806 participants for whom that voxel falls in the 10% of voxels across the brain with the highest t-values for the language > control contrast, in the atlases based on the fixed-thresholding approach, the overlap values correspond to the proportion of the 806 participants for whom that voxel is significant for the language > control contrast at the relevant threshold.
Note that in addition to the classic left frontal and left temporal areas (and their right-hemispheric homotopes), several other areas emerge in the atlas, including in the right cerebellum and in the visual cortex. These less canonical areas have been reported in past language studies (e.g. 77,78 ), but we acknowledge that in general, these have not been as thoroughly functionally characterized as the core frontal and temporal areas, and may in future work be shown not to be selective and/or critical for language function.
Finally, one might ask: how similar is the topography of a probabilistic functional atlas to that of a random-effects group map for the same data. Of course, these are expected to be correlated given that voxels which are task-responsive in a greater portion of the participant population (i.e., have higher probability overlap values in the atlas) are likely to yield higher t-values in the voxel-wise t-tests (see SI-7 for this comparison for LanA). The critical advantage of the probabilistic functional atlas like LanA over a random-effects map is the www.nature.com/scientificdata www.nature.com/scientificdata/ straightforward interpretation of the voxel values that it affords, in terms of the probability that the voxel belongs to the relevant functional area/network (the language network in this case). Such information cannot be inferred from t-values in a random-effect map without additional assumptions/mapping functions.

Neural markers.
In addition to the population-level atlases, we also provide a set of individual-level neural markers (based on the volumetric SPM analyses) for the language network in each participant. These neural markers include: effect size, voxel count, and spatial correlation (additional information on these markers provided below). All of these markers have all been shown to be reliable within individuals over time, including across scanning sessions 11 . We provide each of these measures for each of the ROIs constrained by the previously defined language 'parcels' (available at 54 http://evlab.mit.edu/funcloc), which include in each hemisphere three frontal parcels (inferior frontal gyrus [IFG], its orbital portion [IFGorb], and middle frontal gyrus [MFG]) and three temporal/parietal ones (anterior temporal [AntTemp], posterior temporal [PostTemp], and angular gyrus [AngG]), for a total of 12 parcels. Of the 806 participants included in the atlas, only 803 completed the 2 or more runs, as needed to calculate the effect size and spatial correlation markers; for the remaining 3 participants, only voxel count is provided.
Effect size was operationalized as the magnitude (% BOLD signal change) of the critical language > control contrast. Within each parcel, we defined-for each participant-a functional ROI (fROI) by selecting 10% of the mask's total voxels with the highest t-values for the language > control contrast using all but one run of the data. We then extracted from the left-out run the responses to the language and control conditions and computed the language > control difference. This procedure was repeated across all run partitions. This across-runs cross-validation procedure 3 ensures independence between the data used to define the fROIs and estimate their responses 79 . In the final step, the estimates were averaged across the cross-validation folds to derive a single value per participant per fROI. Voxel count (activation extent) was defined as the number of significant voxels for the critical language > control contrast at a fixed statistical threshold (p < 0.001 uncorrected threshold). Spatial correlation (stability of the activation landscape) was defined-for the voxels falling within the language parcels-as the Fisher-transformed Pearson correlation coefficient between the voxel responses for the language > control contrast across odd-and even-numbered runs. As noted above, for all three measures, we provide 14 values per participant: one for each of the 12 ROIs (6 in each hemisphere), and two additional per-hemisphere values (averaging across the 6 ROIs in each hemisphere). See Table 3 for a summary of these neural markers within the atlas population. Additional measures can be computed based on the measures we provide (e.g., lateralization can be computed from the voxel counts 80 ), and other measures can be extracted from the whole-brain activation maps (see Data records).
These different measures can be explored with respect to each other, or to the demographic variables (but see 81 for a discussion about the prevalence of underpowered brain-behavior individual differences studies). These measures can also serve as normative distributions against which any new population can be evaluated, including children or individuals with developmental and acquired brain disorders, or otherwise atypical brains 82 .

Data Records
The full dataset, including the SPM and FreeSurfer atlases are available for download 83 (https://doi.org/10.6084/ m9.figshare.20425209). Along with the atlases, we make available i) individual contrast and significance maps (for both the volume-based SPM and the surface-based FreeSurfer pipelines; because we had not obtained consent for raw data release, we cannot make publicly available the raw dicom/NIfTI images), and ii) a dataset of individual neural markers.
The complete dataset can additionally be accessed at http://evlabwebapps.mit.edu/langatlas/ via the prepackaged download links. The 'Download SPM Atlas' and 'Download FS Atlas' links provide a copy of the language atlas in their respective formats. The SPM atlas is a single volumetric NIfTI file, whereas the FS atlas is comprised of two overlay NIfTI files, one for each hemisphere. Under 'Download All SPM Data' and 'Download All FS Data' , each of the individual participant's data can be downloaded. In particular, for each of the 806  Table 3. Neural marker distributions. Summary of the neural markers for the language > control contrast of the 803 participants included in the atlas for whom we have 2 or more runs. Effect sizes reflect the % BOLD signal change for the language > control contrast in the language fROIs (estimated using across-runs cross-validation, as described in the text). Voxel counts reflect the number of significant voxels for the critical language > control contrast at a fixed statistical threshold (p < 0.001 uncorrected) within the language parcel boundaries (see text for details; Neural markers). Spatial correlation is defined as the Fisher-transformed Pearson correlation coefficient between the voxel responses for the language > control contrast across odd-and even-numbered runs within the language parcel boundaries. LH = Left Hemisphere; RH = Right Hemisphere. The columns show the values that correspond to the minimum value, the value at the 25 th percentile of the population distribution, the median, the value at the 75 th percentile, and the maximum value.
www.nature.com/scientificdata www.nature.com/scientificdata/ participants (804 for FS), we provide a 'Demograhics_&_Summary.txt' file, which contains relevant information as in Tables 2 and 3, the individual contrast and significance maps, and a visualization of their individual activation profile in the selected template space.
As well as allowing the user to download the data, the LanA website offers opportunities for online exploration and the retrieval of relevant subsets of data. In particular, individual activation maps can be explored under the 'Explore Activation Maps' tab, and relevant neural markers can be explored under the 'Explore Neural Markers' tab. In addition, data can be filtered by demographic variables including, age, gender, handedness, native English speaker status, language network lateralization, etc., and these subsets can be downloaded, or their maps/neural markers can be explored. This flexible tool allows individual users to access relevant data for their needs without the requirement for offline filtering.
Finally, at 54 , we provide a version of the language localizer experiment (Localizer Version A, which is used for the majority of participants) for download.

technical Validation
The individual participants' data quality check was performed as described in the Participants and session selection section.
Individual localizer versions were evaluated to confirm they each elicited a strong language > control effect, as described in SI-3.
The atlas creation process was evaluated with respect to several hyperparameter choices, and the atlas remained robust to each decision, including the inclusion of non-native but proficient English speakers, different localizer versions, the use of whole-brain maps based on t-values vs. contrast values, and definition of the language system as the top 10% of language > control voxels vs. as language > control voxels that pass a specific significance threshold. We summarize the (minimal) impact of all these choices in SI-2.
Finally, in SI-1, we demonstrate that group-level ROIs defined based on the highest-overlap voxels in LanA outperform commonly used Glasser ROIs derived from multi-modal Human Connectome Project data 55 in effect size estimation. The latter grossly underestimate effect sizes, especially for the frontal language areas. Of course, as expected 3 , individual-level language fROIs are still the best for accurately estimating effect sizes, and these outperform the group-based LanA fROIs, but in cases where individual localization may not be possible (e.g., in retroactively re-analyzing past studies), LanA-based group ROIs are recommended, as they fare substantially better than Glasser ROIs.

Usage Notes
The data records presented in this paper, including materials for download and exploration at the http:// evlabwebapps.mit.edu are available for free and fair use to individual and academic researchers, institutions, and entities provided that this work is appropriately referenced. Although this atlas has potential for clinical applications, the authors assume no responsibility for the use or misuse of LanA and associated data records in clinical and other settings.

Code availability
Code associated with this manuscript can be found at OSF 70 .